Skip to content

Conversation

@martinrohbeck
Copy link

@martinrohbeck martinrohbeck commented Nov 6, 2023

This PR makes sure that the metadata is checked for each view individually. If this is not the case, then a view without metadata is assumed to have metadat as long as the first view has some. Assuming non-existing metadata leads to a concatenation of empty series, which breaks the code.

Here is a small example to reproduce the bug: Computing HVG on the mrna view leads to the issue, because this operation creates the metadata for the mrna view.

import scanpy as sc
import muon as mu
import pandas as pd
import mofax as mfx

drugs = pd.read_csv("data/cll_drugs.csv", index_col=0).T
metadata = pd.read_csv("data/cll_metadata.csv", index_col=0).T
mrna = pd.read_csv("data/cll_mrna.csv", index_col=0).T

mrna = sc.AnnData(mrna)
drugs = sc.AnnData(drugs)

sc.pp.highly_variable_genes(mrna, n_top_genes=10)
mrna = mrna[:, mrna.var["highly_variable"]]

mods = {"mrna": mrna, "drugs": drugs}

obs = pd.read_csv("data/cll_metadata.csv", sep=",", index_col=0)

mdata = mu.MuData(mods)
mdata.obs = mdata.obs.join(obs)

mu.tl.mofa(
    mdata,
    use_obs="union",
    n_factors=3,
    convergence_mode="medium",
    outfile="models/temp.hdf5",
    save_metadata=True,
    save_data=True,
    verbose=False,
)

model = mfx.mofa_model("models/temp.hdf5")

Note that

  1. use_var also does not work in this case (due to the added view-prefix), but I'll provide a fix in another PR.
  2. a simple .copy() of the rna data frame does not help

@martinrohbeck martinrohbeck requested a review from gtca November 6, 2023 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants